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Abstract. This paper defines the basis of a new hierarchical framework 
for segmentation algorithms based on energy minimization schemes. This 
new framework is based on two formal tools. First, a combinatorial pyra- 
mid encode efficiently a hierarchy of partitions. Secondly, discrete geo- 
metric estimators measure precisely some important geometric param- 
eters of the regions. These measures combined with photometrical and 
topological features of the partition allows to design energy terms based 
on discrete measures. Our segmentation framework exploits these ener- 
gies to build a pyramid of image partitions with a minimization scheme. 
Some experiments illustrating our framework are shown and discussed. 

1 Introduction 

The convergence of energy minimization and hierarchical segmentation algo- 
rithms provides a rich framework for image segmentation. This framework is 
based on an objective criterion, called energy, whose minimization defines a 
salient partition according to a given problem . The energy of a partition is 
generally decomposed by summation over each region as a weighted sum of two 
terms E{R) = Eimg{R) + vEreg{R) where Eimg may be understood as a fit to 
the data within the region while E^eg corresponds to a regularization term. The 
parameter v defines the respective weights of the two terms. The Mumford-Shah 
energy is a classical instance of this approach [1] . Such equation may also be in- 
terpreted within the Minimum Description Length (MDL) framework [2], where 
the two energies Eimg and Ereg represent respectively the encoding costs of the 
photometry and the geometry of a region. 

Several methods have been proposed in order obtain a partition minimiz- 
ing an energy. These methods include the level set approach [1], graph cuts [3] 
and the methods based on a region merging scheme [4-7]. The definition of a 
meaningful segmentation using an energy minimization framework and a merge 
scheme supposes first to define a merge strategy. If the parameter v is fixed, a 
near optimal strategy consists to merge at each step the two regions, the merg- 
ing of which induces the greatest decrease of the energy until any merge would 
increase the energy. The obtained partition is said to be 2 normal at the scale 
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V [4,5]. An alternative strategy [6] consists to merge at each step the two re- 
gions whose union would belong to the 2 normal partition of lowest scale. This 
reduction framework avoids the need to select a vector of v parameters encod- 
ing a priori the difiFerent scales of interest. However, previous works [4-6] where 
based on a sequence of merge operations combined with a stopping criterion 
(number of regions, maximal value of ly. . .). Guigues et. al. [7] encode explicitly 
the hierarchy of partitions using a reduction scheme similar to [6] but uses the 
hierarchy in order to build for any value of the optimal partition which may be 
defined from the hierarchy. Moreover, instead of starting from the grid of pixels 
like [6], their initial partition is an over partition of the image, which presents 
two fundamental advantages. First, the initial over segmented partition allows 
to compute reliable statistics on regions. Secondly, it restricts the set of possible 
partitions and thus reduces the risk to be trapped into a local minima. 

The second problem that should be addressed by a segmentation algorithm is 
the correct design of the energy terms. For instance, the classical Mumford-Shah 
energy simply combines the squared error of each region together with the total 
length of the partition boundaries. However, as shown by several authors [7], 
more complex models (both geometrical and photometrical) may handle finer 
definitions of salient partitions. Their design requires to fit geometrical models 
onto regions. An efficient access to the set of boundaries of each region and to 
their geometry is thus compulsory. However, classical hierarchical segmentation 
frameworks are not adequate for this task. Adaptive pyramids based on graph [8] 
do not present a 1-1 correspondence between region adjacencies and geometri- 
cal boundaries: reconstructing the geometry of a region is then tricky. Dual 
graphs [9] behave better for this task but the expHcit encoding of all reduced 
graphs restricts the number of merge steps. 

This paper provides a new framework that addresses the design of new energy 
terms based on geometrical and photometrical features. The stack of successively 
reduced partitions is encoded using a combinatorial pyramid [10]. A very fine 
granularity for the hierarchy is then achieved since regions are merged two by 
two and a new level of the pyramid is created for each merging operation. Geo- 
metrical features are computed on each partition of the hierarchy using discrete 
geometric estimators of normal and length. This framework offers then a com- 
pact and efficient encoding of the hierarchy together with an efficient access to 
the geometrical and topological properties of the partition. It came thus as a 
natural complement to methods searching for optimal partitions. The paper is 
structured as follows. We present in Section 2 the combinatorial pyramid model. 
The application of this model to compute geometrical features on regions using 
discrete geometric estimators is presented in Section 3. We then present in Sec- 
tion 4 one energy based on discrete estimators together with some experiments. 

2 Combinatorial Pyramids 

This paper is based on combinatorial maps [11]. A combinatorial map may be 
seen as a planar graph encoding explicitly the orientation of edges around a 
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Fig. 1. A dual of a combinatorial map (a) encoding a 3 x 3 grid with the con- 
tracted combinatorial map (b) obtained by the contraction of the contraction ker- 
nel (CK) Ki = a*(l, 2, 10, 11, 12, 6). The reduced combinatorial map (c) is obtained 
by the removal of the empty self loops defined by the RKESL K2 = a* (4) and 
the removal kernel of empty double edges (RKEDE) K3 = q*(13, 14, 15, 19, 18,22) U 
{24, -16, 17, -20, 21, -23, 3, -5}. 



given vertex. To do so, each edge of a planar graph is spHt into two half-edges 
called darts (e.g. darts 16 and —24 in Fig. Ic). Since each edge connects two 
vertices, each dart belongs to only one vertex. A combinatorial map is formally 
defined by a triplet G = (T>, a, a) where V represents the set of darts and cr is a 
permutation on V whose cycles correspond to the sequence of darts encountered 
when turning counter-clockwise around each vertex. Finally a is an involution 
on V which maps each of the two darts of one edge to the other one (e.g. a maps 
16 to —24 and —24 to 16 in Fig Ic). The cycles of a and a containing a dart d 
will be respectively denoted by a*{d) and <7*{d). 

Given a combinatorial map G = {T>,a,a), its dual map is defined by G = 
{V, ip, a) with (f = a o a. The cycles of permutation ip encode the faces of the 
combinatorial map and may be interpreted as the sequence of darts encountered 
when turning clockwise around a face. The cycle of ip containing a dart d will 
be denoted by 'p*{d). 



2.1 Combinatorial map encoding of a planar sampling grid 



Combinatorial maps can also code the low level geometry of image pixels. Indeed, 
Fig. la describes a dual combinatorial map Go = {T>o,ipo,ao) encoding a 3 x 3 
4-connected planar sampling grid. The ip, a and a cycles of each dart may 
be respectively understood as elements of dimensions 0, 1 and 2 and formally 
associated to a 2D cellular complex [10]. More precisely, each ao cycle may be 
associated to a linel (sometimes also called crack or surfcl) between two pixels. 
Each of the two darts of an ao cycle corresponds to an orientation along the 
linel. For example, the cycle q;o(1) = (1, —1) is associated to the linel encoding 
the right border of the top left pixel of the 3x3 grid (Fig. la). Darts 1 and 
— 1 define respectively a bottom to top and top to bottom orientation along the 
linel. 
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2.2 Construction of Combinatorial Pyramids 

A combinatorial pyramid is defined by an initial combinatorial map successively 
reduced by a sequence of contraction or removal operations. Contraction opera- 
tions are encoded by contraction kernels (CK). These kernels, defined as a forest 
of the current combinatorial map, may however create redundant edges such as 
empty-self loops and double edges. Empty self loops (edge a* (4) in Fig. lb) may 
be interpreted as region inner boundaries and are removed by a removal kernel of 
empty self loops (RKESL) after the contraction step. The remaining redundant 
edges, called double edges, belong to degree 2 vertices in G (e.g. ipl{13), </jJ(14), 
V3*(15)) in Fig. lb) and are removed using a removal kernel of empty double edge 
(RKEDE) which contains all darts incident to a degree 2 dual vertex. Further 
details about the construction scheme of a combinatorial pyramid may be found 



As mentioned in Section 2.1, if the initial combinatorial map encodes a pla- 
nar sampling grid, the geometrical embedding of each initial dart corresponds 
to an oriented linel. Moreover, each dart of a reduced map that is not a self 
loop encodes a connected boundary between two regions. The embedding of the 
boundary associated to such a dart may be retrieved from the embedding of 
the darts of the initial map Gq. Let us consider the reduced combinatorial map 
Gi = {Vi, ai, ai) defined at level i and one dart d € Vi which is not a self loop. 
The sequence c?i . . . , c?„ of initial darts encoding the embedding of the dart d is 
obtained from the receptive field of d [10] within Gq using the following relation: 



where Go ~ (I^o, Vo, "o) is the dual of the initial combinatorial map and mj is 
the smallest integer q such that (pQ{ao{dj)) survives at level i or belongs to some 
former RKEDE. The dart dn is the first dart defined by Eq. (1) which survives 
up to level i. This dart also satisfies ao{dn) = cei(d) by construction of the 
receptive fields. Note that the tests performed on LpQ{ao{dj)) , q £ {1, . . . , rrij} to 
determine if it is equal to dj+i or d„ are performed in constant time using the 
implicit encoding of combinatorial pyramids [10]. 

2.3 Embedding of region boundaries 

Let us consider the dart 16 in Fig. Ic. This dart encodes the border between 
the background and the first row of the 3x3 grid encoded by the cycle 
a3(16) = (16, 7, 8) of G3. The sequence of initial darts encoding the boundary of 
the dart 16 is retrieved using Eq. (1) and is equal to: 16.15.14.13.24 (Fig. lb). We 
have for example 15 = (^o(q;o(16)) = Lpo{—lQ) (Fig. Ic). Since each initial dart is 
associated to an oriented linel, one may associate a sequence of Freeman's code 
to each sequence of initial darts (Fig. lb) and thus to each dart of a reduced 
combinatorial map Gi. The sequence of Freeman's codes associated to a dart d 
is denoted Sd and is called the segment associated to d. for example, the segment 
associated to the dart 16 is equal to sie = 1.2.2.2.3. 
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Fig. 2. The central white region cr*(l) (a) contains several subregions. Its boundary 
is thus split into several connected components connected by bridges in G (b). These 
edges correspond to self loops in G (c). 

3 Discrete geometry over a partition 

As mentioned in Section 2, each edge {d,ai{d)) of a partition G that is not 
a self loop encodes a connected boundary between two regions. The edge is 
called separating. On the other hand, a self loop corresponds to a bridge in 
the dual combinatorial map and is characterized by ai{d) e cF*{d) (e.g. edge 
(3,-3) or (5,-5) in Fig. 2bc). Such edges, called fictive, either connect the 
outer boundary to some inner boundary (e.g. edge (3, —3) in Fig. 2) or connect 
two inner boundaries (edge (5, —5) in Fig. 2) [10]. 

Each separating edge is embedded as a 4-connected digital path, included in 
the interpixel digital plane (Section 2.3 and [10]). When estimating the geometry 
of the boundary of the region, fictive edges do not play any role. More precisely 
the concatenation of only the separating edges defines also a set of 4-connected 
digital loops. Each of these loops is either the outer boundary of the region or 
one of its inner boundaries [10]. Given an initial dart d belonging to a separating 
edge. Algorithm 1 extracts a boundary between region a*{d) and its complement 
(setting Lin = <^*[d)) or between regions cr*(d) and cr*(d') and their complement 
(setting Li„ = (J*{d) U a*{d')). Its principle is to follow the boundary with a 
except it skips fictive edges and edges in-between (J*(d) and a*{d'). This method 
for tracking a boundary is easily understood on Fig. 2b, where for instance the 
algorithm tracks from dart 1, then 2 ~ c(l), 3 is skipped since —3 S o-*(l), then 
8 = cr(— 3) and terminates on 1 = <t(8) again. Extracting all the boundaries of a 
region is done in a similar way. All these algorithms can be implemented with a 
complexity linear with the number of boundary linels. 

3.1 Geometry with digital straight segments 

We may now examine how geometric quantities can be estimated on a closed 4- 
connected digital contour C, which is some boundary of a region or two adjacent 
regions (computed as in the previous paragraph). The literature is abundant on 
this topic and we restrict ourselves to pure discrete geometry tools based on 
digital straight segment (DSS) recognition. Several equivalent definitions of DSS 
exist together with several classes of algorithms to recognize them on digital 
curves (see for instance [12] for a recent survey). We chose here to present briefiy 
the arithmetic point of view of digital lines, which leads to rather simple and 
efficient algorithms [13,14]. 
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Algorithm 1 Algorithm to visit all the linels of the digital boundary encircling 
region(s) specified by their darts Lin and containing the dart d. 

1 Function Map::boundary( dart d, darts Li„ ) : Freeman chain 
Ensure: Return a sequence of Freeman's codes that is a 4-connected loop. 
Require: d ^ Li„ 

2 list C <- 0, dart b ^d 
repeat 

C.append(si,) 
6 <-cr(6) 

while a{b) G Lin do {Skip fictive or interior edges} 
b ^^(b) 

end while 
until b — d 
return C 




Fig. 3. Left: every maximal segment along this contour is drawn as its rectangular 
bounding box. Right: A-Maximal Segment tangent estimation at a given point. 



The set of points {x, y) of the digital plane verifying ^ < ax — by < ^-|-|a|-|-|6|, 
with a, b and ^ integer numbers, is called the standard line with slope a/b and 
shift ^. A standard line is always 4-connected. A sequence of consecutive points 
Cij indexed from i to j of the digital curve C is a digital straight segment (DSS) 
iff there exists a standard line {a,b,^) containing them. The one with smallest 
a + b determines its characteristics, in particular its slope a/b. Any DSS Z thus 
defines an angle 9{Z) between its carrying standard line and the x-axis (in [0; 27r[ 
since a DSS is oriented), called the direction of Z. 

The predicate "Cij is a DSS" is denoted by S{i,j). Incremental algorithms 
exist to recognize a digital straight segment on a curve and to extract its char- 
acteristics [13]. Therefore deciding S{i,j + 1) or S{i — 1, j) from S{i,j) are 
0(1) operations. Any DSS Cij is called a maximal segment iff -^S{i,j + 1) and 
-iS'(i — 1, j). Maximal segments are thus the inextensible DSS of the curve (Fig. 3, 
left) . Note that the set of all maximal segments of a curve can be computed in 
time linear with the number of curve points [14]. 
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3.2 Tangent, normal and length estimation 

Several tangent estimators based on DSS recognition have been proposed. We 
propose to use the A-Maximal Segment Tangent estimator (A-MST) to approach 
the tangent direction at any point of the digital curve [15]. It was indeed shown 
to give good approximations even at coarse scale, to be rather independent from 
rotations and to be asymptotically convergent. 

Fig. 3, right, gives the essential idea of this tangent estimator. Given a point, 
the direction 9i of every maximal segment containing it is evaluated. The rela- 
tive position a of the point within the maximal segment is also computed. The 
A-MST tangent direction 9 is some weighted combination of the preceding pa- 
rameters: 9 = / ^i^i))- In our experiments, the mapping A was 
defined as the triangle function taking base value at and 1, and peak value 
1 at i. For further details, see [15]. 

The experimental average number of maximal segments per linel is between 
3 and 4. Therefore computing the A-MST direction is not costly and is a 0(1) 
operation on average. This technique of tangent estimation is easily extended 
to any real curvihnear abscissa along the digital contour. The tangent is thus 
defined at any hnel, taking half integer abscissas. 

The estimation of the normal vector at Ck is then simply the vector fi(A:) = 

sin(6'(fc)), cos(6'(fc))). The elementary length l(k,k + 1) of a linel Ck,k+i is 
defined as | cos(6'(fe -I- 0.5))| for horizontal linels and | sin(0(fc + 0.5))| for vertical 
hnels. It corresponds to an estimation of the length of a unit displacement along 
the digital curve. The length of C is estimated by simple summation of the 
elementary length of its linels. This method of length evaluation was reported to 
give very good experimental results [16]. If is boundary (6, o-*{b)) as returned 

by Algorithm 1, then its length is L{b]G) = Y}kJj{k,k + l;C^). The total 
perimeter Per(i?(cr*(6)); G) of the region cr*(6) is the sum of the length of each 
of its boundaries. 



4 Energy of a partition and pyramidal segmentation 

The geometrical features (normal, perimeter, polygonalization) defined in Sec- 
tion 3 may be computed on each region of a partition in order to provide dif- 
ferent measures of its geometrical characteristics. Such measures may then be 
incorporated into a hierarchical segmentation algorithm based on an energy min- 
imization scheme (Section 1). Such energy balances two terms: the goodness of 
fit term and a regularization term which penalizes unlikely or complex models. 
The energy of a partition encoded by the map G is simply called the energy of 
the combinatorial map G and is formally defined as follows: Let G = (P, cr, a) 
be a combinatorial map with a geometrical embedding in the digital grid and an 
input image / over this grid. Let T>„ be the set of cr-cycles of V. The energy of 
the combinatorial map G is 
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(a) (b) (c) (d) 

Fig. 4. Influence of length penalization: (a) image Girl, (b) one level of the pyramid 
with l{k, fc + 1) = 1, (c) same level of a pyramid built using discrete length estimators. 
All the boundaries of the pyramid which contains (c) are superimposed on (d). The 
darkest boundaries are those who survive at the highest levels. 



E{G)^ E{a*{d)) (2) E(G*{d)) = E,,^^{G*{d)) + i,E,,^{a*{d)) (3) 

Eq. (2) indicates that the global energy is decomposable over each region. 
This property helps in defining fast algorithms for region decimation. Eq. (3) 
balances the two energies, one dependent on the image (the image energy -Eimg), 
the other dependent only on the model (the regularization energy -Breg)- 

The parameter v is often interpreted as a scale parameter, since it privileges 
the goodness of fit for low values (and over-segmentation) and a priori most 
Hkely regions for high values (and under-segmentation) . 

The image energy used within our experiments is defined as follows: 

E,^^{a*{d)) = -5 ||Z?/(C,)||f(fc,fc + l)+ I|/(a;,y)-/i.*wf 

where l(k, k + 1) denotes the length estimate of a lignel at point fc, I{x, y) 
denotes the color of the pixel {x,y) and \\D I{Ck)\\ the norm of the differential 
of / at point fc. This last measure is equal to the norm of the gradient for grey 
level images. The term ^a*(d) represents the mean color of the region encoded by 
fj* id) . The second sum of the above expression denotes thus the squared error 
of the region. Finally, the term 5 represents the respective weight of the gradient 
and squared error energies. 

The regularization energyjs^ defined from the estimate of the perimeter of 
the region as E^e^{(j*{d)) = Per(i?(tT*(d)); G) (Section 3.2). Given two possible 
merge operations inducing the same variation of the image energy, this choice 
for the regularization term favors the one which induces the simplest partition 
with the lowest overall length of contours. The advantage of using discrete length 
estimators compared to a basic count of the lignels is to make the segmentation 
process more independent on the alignment of components wrt some axes. 
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We tested the influence of length penalization on the classical Girl test image 
(Fig. 4). Two pyramids have been built on an initial partition encoded by a 
combinatorial map Gq. This partition is defined by a watershed algorithm applied 
on the gradient of the Girl test image. The parameter v is fixed to 1.3 during 
the construction of both pyramids. Fig. 4(b) represent one level of the first 
pyramid built using a fixed length estimate equal to 1 for all Hgnels. Fig. 4(c), 
represents the same level within the second pyramid built using the discrete 
length estimator defined in Section 3.2. As shown by Fig. 4(c) the more accurate 
measure of the length given by the discrete length estimator provides smoothest 
boundaries. 



Pyramidal segmentation algorithm 

Our energy minimisation method starts with an initial partition coded by a 
map, and merges at each step the two adjacent regions, the merging of which 
induces the greatest decrease (or the smallest increase) of the combinatorial map 
energy. This process may be interpreted as a gradient descent which continues 
when a local minima is reached in order to seek for other minima. Note that 
our framework is not devoted to a specific strategy for energy minimization. 
Many alternative optimization heuristics could be used (e.g. the scale-climbing 
of Guigues et. al. [7]). The proposed approach is however sufficient to compare 
the respective advantages of different energies. Let us additionally note that 
using our strategy or the scale climbing of Guigues et al., only two regions are 
merged between two consecutive levels of the pyramid. This merge strategy does 
not induce a high memory cost due to the implicit encoding of the combinatorial 
pyramid [10]. An explicit construction of all the reduced graphs using graph or 
dual graph pyramids would require a huge amount of memory with a lot of 
redundancy between graphs. 

5 Conclusion 

We have presented a new framework for segmenting images with a pyramidal 
bottom-up approach using an energy-minimizing scheme. Our framework com- 
bines combinatorial pyramids, which can represent in the same structure all the 
levels of a hierarchy, and discrete geometric estimators, which provide precise ge- 
ometric measurements and allow the definition of new regularization and image 
energy terms. A greedy algorithm for computing the hierarchy was also provided 
and some examples of segmentation were exhibited and discussed. 

Our first experiments show that, the length estimation can have a great 
infiuence on the regularization of the segmentation. Discrete geometric esti- 
mators provides some smoothest boundaries. However, they are useless if the 
over-segmentation gives irregular regions. In futur works, we want to tackle this 
problem by using a smoothest over-segmentation. 
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